Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support websocket for exec and attach #1305

Closed
wants to merge 1 commit into from

Conversation

duguhaotian
Copy link

@duguhaotian duguhaotian commented Dec 5, 2023

What type of PR is this?

Add one of the following kinds:
/kind feature

Optionally add one or more of the following kinds if applicable:
/kind api-change

What this PR does / why we need it:

Which issue(s) this PR fixes:

#1296

Special notes for your reviewer:

Does this PR introduce a user-facing change?

add new option for exec/attach, manual is: crictl exec/attach -w -it ${container-id} ls

test

i use container engine is iSulad which implement stream server is websocket.

$ export CRICTL_REMOTE_COMMAND_WEBSOCKETS=true
$ mycrictl exec -w -it b277916602c7 ls
bin   dev   etc   home  proc  root  sys   tmp   usr   var

# tcpdump logs
08:53:15.674128 IP localhost.10350 > localhost.40462: Flags [.], ack 303, win 510, options [nop,nop,TS val 1395824017 ecr 1395824017], length 0
E..4o.@.@..,........(n...       ..T=.......(.....
S2..S2..
08:53:15.675140 IP localhost.10350 > localhost.40462: Flags [P.], seq 1:173, ack 303, win 512, options [nop,nop,TS val 1395824018 ecr 1395824017], length 172
E...o.@.@...........(n...       ..T=.............
S2..S2..HTTP/1.1 101 Switching Protocols
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Accept: mvvvKmgjEhY8040cH4xt8YSYr+Y=
Sec-WebSocket-Protocol: v5.channel.k8s.io


08:53:15.675156 IP localhost.40462 > localhost.10350: Flags [.], ack 173, win 511, options [nop,nop,TS val 1395824018 ecr 1395824018], length 0
E..4|w@[email protected]..........(nT=...   .......(.....
S2..S2..
08:53:15.675527 IP localhost.40462 > localhost.10350: Flags [P.], seq 303:336, ack 173, win 512, options [nop,nop,TS val 1395824018 ecr 1395824018], length 33
E..U|x@.@..(..........(nT=...   .......I.....
S2..S2....S..mW..::...q..^g..%6...'..^c..
08:53:15.716065 IP localhost.10350 > localhost.40462: Flags [.], ack 336, win 512, options [nop,nop,TS val 1395824059 ecr 1395824018], length 0
E..4o.@.@..*........(n...       ..T=.......(.....
S2..S2..
08:53:15.755820 IP localhost.10350 > localhost.40462: Flags [P.], seq 173:337, ack 336, win 512, options [nop,nop,TS val 1395824099 ecr 1395824018], length 164
E...o.@.@...........(n...       ..T=.............
S2..S2...~....[1;34mbin.[m   .[1;34mdev.[m   .[1;34metc.[m   .[1;34mhome.[m  .[1;34mproc.[m  .[1;34mroot.[m  .[1;34msys.[m   .[1;34mtmp.[m   .[1;34musr.[m   .[1;34mvar.[m

08:53:15.796061 IP localhost.40462 > localhost.10350: Flags [.], ack 337, win 512, options [nop,nop,TS val 1395824139 ecr 1395824099], length 0
E..4|y@[email protected]..........(nT=...   .2.....(.....
S2..S2..
08:53:15.856882 IP localhost.10350 > localhost.40462: Flags [P.], seq 337:360, ack 336, win 512, options [nop,nop,TS val 1395824200 ecr 1395824139], length 23
E..Ko.@.@...........(n...       .2T=.......?.....
S2.HS2.....{"status":"Success"}
08:53:15.856891 IP localhost.40462 > localhost.10350: Flags [.], ack 360, win 512, options [nop,nop,TS val 1395824200 ecr 1395824200], length 0

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 5, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @duguhaotian!

It looks like this is your first PR to kubernetes-sigs/cri-tools 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cri-tools has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 5, 2023
@duguhaotian duguhaotian force-pushed the master branch 2 times, most recently from e047447 to 181cf6e Compare December 6, 2023 02:47
if err != nil {
return err
}
executor, err = remoteclient.NewFallbackExecutor(websocketExecutor, executor, httpstream.IsUpgradeFailure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the intention of the crictl? Should we just use websocket executor if requested without the fallback? It may be better for the end user to be able to explicitly ask for websocket.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

id: id,
tty: c.Bool("tty"),
stdin: c.Bool("stdin"),
websocket: c.Bool("websocket"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the name use-websocket be better? Or maybe have something with three options: -- transport=spdy|websocket|default where default will do websocket with the fallback to spdy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@duguhaotian
Copy link
Author

i donot know why check of runc and cri-o critest failed?

the report error is spdy upgrade failed? why?

Comment on lines 193 to 198
we, err := remoteclient.NewWebSocketExecutor(&rest.Config{TLSClientConfig: rest.TLSClientConfig{Insecure: true}}, "GET", url.String())
framework.ExpectNoError(err, "failed to create websocket executor for %q", execServerURL)

e, err = remoteclient.NewFallbackExecutor(we, e, httpstream.IsUpgradeFailure)
framework.ExpectNoError(err, "failed to create fallback executor for %q", execServerURL)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making the tests fail in the CI. Do runtimes have to support that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If websocket failed, will fallback to spdy that is same to before. Why make ci failed?
I think that IsUpgradeFailure will ignore error of websocket upgrade

Copy link
Author

@duguhaotian duguhaotian Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and error : unable to upgrade connection: 404 page not found is reported by spdy(vendor/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go#NewConnection)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove this first, wait runtimes support it

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we test this? Using CRI-O main always returns an error on exec:

sudo ./build/bin/linux/amd64/crictl exec -p websocket 20168cc1a0aefa251fed echo hi
FATA[0000] execing command in container: unable to upgrade streaming request: websocket: bad handshake

CRI-O logs:

ERRO[2023-12-08 09:28:57.322412994+01:00] unable to upgrade websocket connection: websocket server finished before becoming ready

The error is coming from:

https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/vendor/k8s.io/apimachinery/pkg/util/httpstream/wsstream/conn.go#L223-L225

We're vendoring Kubernetes v1.29.0-rc.1 right now, so I'm wondering which part is missing what 🤔

Comment on lines 74 to 75
Usage: "transport protocol, One of: spdy|websocket|default",
Value: "default",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep spdy as default value and remove default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@duguhaotian
Copy link
Author

How can we test this? Using CRI-O main always returns an error on exec:

sudo ./build/bin/linux/amd64/crictl exec -p websocket 20168cc1a0aefa251fed echo hi
FATA[0000] execing command in container: unable to upgrade streaming request: websocket: bad handshake

CRI-O logs:

ERRO[2023-12-08 09:28:57.322412994+01:00] unable to upgrade websocket connection: websocket server finished before becoming ready

The error is coming from:

https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/vendor/k8s.io/apimachinery/pkg/util/httpstream/wsstream/conn.go#L223-L225

We're vendoring Kubernetes v1.29.0-rc.1 right now, so I'm wondering which part is missing what 🤔

I think CRI-O need update code of websocket, like support 'v5.channel.k8s.io'

@saschagrunert
Copy link
Member

I think CRI-O need update code of websocket, like support 'v5.channel.k8s.io'

I may miss something here, but we use the code vendored from Kubernetes: https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/go.mod#L75-L80

@duguhaotian
Copy link
Author

duguhaotian commented Dec 11, 2023

I think CRI-O need update code of websocket, like support 'v5.channel.k8s.io'

I may miss something here, but we use the code vendored from Kubernetes: https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/go.mod#L75-L80

can we get more logging of cri-o ?

why http server failed? code:
https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/vendor/k8s.io/apimachinery/pkg/util/httpstream/wsstream/conn.go#L214C3-L214C3

@saschagrunert
Copy link
Member

saschagrunert commented Dec 11, 2023

@duguhaotian it returns there: https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/vendor/golang.org/x/net/websocket/server.go#L84

With the error message:

requested protocol(s) are not supported: [v5.channel.k8s.io]; supports [ channel.k8s.io base64.channel.k8s.io v4.channel.k8s.io v4.base64.channel.k8s.io]

@duguhaotian is it intentional that runtimes only support v1-v4? Ref: https://github.com/kubernetes/client-go/blob/84a6fe7e4032ae1b8bc03b5208e771c5f7103549/tools/remotecommand/websocket.go#L89-L92

@duguhaotian
Copy link
Author

@duguhaotian it returns there: https://github.com/cri-o/cri-o/blob/f40da54704c5c32fb24b2b0e204424afae5b0dc2/vendor/golang.org/x/net/websocket/server.go#L84

With the error message:

requested protocol(s) are not supported: [v5.channel.k8s.io]; supports [ channel.k8s.io base64.channel.k8s.io v4.channel.k8s.io v4.base64.channel.k8s.io]

@duguhaotian is it intentional that runtimes only support v1-v4? Ref: https://github.com/kubernetes/client-go/blob/84a6fe7e4032ae1b8bc03b5208e771c5f7103549/tools/remotecommand/websocket.go#L89-L92

  1. cri-o not support v5.channel.k8s.io;
  2. i think runtimes just haven't done a good job of supporting v5.channel.k8s.io, and runtimes should support it.

case "spdy":
exec, err = remoteclient.NewSPDYExecutor(config, "POST", url)
case "websocket":
exec, err = remoteclient.NewWebSocketExecutor(config, "GET", url.String())
Copy link
Member

@saschagrunert saschagrunert Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use remoteclient.NewWebSocketExecutorForProtocols and make the protocols CLI configurable?

Ref: kubernetes/kubernetes#122263 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofcourse, default use StreamProtocolV5Name?

Copy link
Member

@saschagrunert saschagrunert Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duguhaotian yeah, this way we'll not require to flip the default again in the future once kubernetes/kubernetes#122263 got resolved.

Copy link
Author

@duguhaotian duguhaotian Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change option to crictl --websocket="x..channel.k8s.io",

  1. if not set --websocket, will use spdy;
  2. if websocket failed, fallback to spdy;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duguhaotian yeah, this way we'll not require to flip the default again in the future once kubernetes/kubernetes#122263 got resolved.

fixed

@aojea
Copy link

aojea commented Dec 11, 2023

There is a KEP and a plane to gradually move on to SPDY https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4006-transition-spdy-to-websockets , we should coordinate, @seans3 is leading this effort and if we implement different bits at different times we'll risk to drift

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: duguhaotian
Once this PR has been reviewed and has the lgtm label, please ask for approval from saschagrunert. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Usage: "protocol of transport, One of: spdy|websocket",
Value: "spdy",
},
&cli.BoolFlag{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a flag and not using fall back , as we are doing with kubectl and is defined in the KEP?

flags are hard to remove in the future and become APIs for users, I also prefer an environment variable instead

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thk you reply. fixed

@aojea
Copy link

aojea commented Dec 16, 2023

I think this should follow the same flow that the kubectl implementation designed in the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4006-transition-spdy-to-websockets#proposal-kubectl-websocket-executor-and-fallback-executor

use the legacy by default and have an environment variable to opt in to websockets by default but fallback to the legacy https://github.com/kubernetes/kubernetes/pull/119186/files#diff-62da7750fb41c9d53934dd9706ba719f0c0efdf931b6b36eb46498505e4d3520

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 17, 2023
@duguhaotian
Copy link
Author

I think this should follow the same flow that the kubectl implementation designed in the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4006-transition-spdy-to-websockets#proposal-kubectl-websocket-executor-and-fallback-executor

use the legacy by default and have an environment variable to opt in to websockets by default but fallback to the legacy https://github.com/kubernetes/kubernetes/pull/119186/files#diff-62da7750fb41c9d53934dd9706ba719f0c0efdf931b6b36eb46498505e4d3520

I gree with you, my first version which use fallback executor.
But, current some runtime(like crio) do not support v5, so will cause CI testcase failed.

@duguhaotian duguhaotian requested a review from aojea December 17, 2023 06:52
@aojea
Copy link

aojea commented Dec 17, 2023

But, current some runtime(like crio) do not support v5, so will cause CI testcase failed.

Please check more in detail how this is implemented in Kubernetes, I linked the corresponding code and KEP for context

  1. v5 will be only used when a special env variable is set (feature flag here is an env variable)
  2. v5 will fallback to v4 if it fails because the container runtime does not implement it, so is not possible CI fails because the fallback has to work, otherwise is a bug in the implementation
  3. if no env variable is set, no new code path is executed

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2024
@saschagrunert
Copy link
Member

I think we could at least support the websocket executor in crictl for users to play around with the feature and runtimes.

Comment on lines +73 to +75
func IsEnabledWebsockets() bool {
return strings.ToLower(os.Getenv("CRICTL_REMOTE_COMMAND_WEBSOCKETS")) == "true"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this

Comment on lines +174 to +176
if !IsEnabledWebsockets() {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !IsEnabledWebsockets() {
return
}

@@ -163,8 +164,27 @@ func Exec(ctx context.Context, client internalapi.RuntimeService, opts execOptio
return stream(ctx, opts.stdin, opts.tty, URL)
}

func getExecutor(url *url.URL) (exec remoteclient.Executor, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we reintroduce the flag to select the executor.

saschagrunert added a commit to saschagrunert/cri-tools that referenced this pull request Mar 21, 2024
This allows end users to choose the transport to be used as well as
runtime developers to experiment on feature development.

Supersedes: kubernetes-sigs#1305

Signed-off-by: Sascha Grunert <[email protected]>
@saschagrunert
Copy link
Member

Thank you for the effort on this topic @duguhaotian, we really appreciate it. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants